Bayesian Analysis of Pensacola Beach Loggerhead Sea Turtle Nesting Data Utilizing 10:1 Pseudo-Absence to Presence Ratio
Authors: Audrey Moore, MS & Laura Sikes, MPH Instructor: Dr. Samantha Seals Date: April 23, 2025
Bayesian vs. Frequentist Approach
Bayesian
Begins with existing belief (Prior) about probability distr. of hypothesized outcome.
Introduces new data with a Likelihood distribution for hypothesized outcome.
Bayes Theorem joins Prior and Likelihood resulting in Posterior, an updated probability distr. for hypothesized outcome.
Strength of Prior and sample size determine influence on Posterior.
Frequentist
Analyzes observed data.
Results in a calculated best estimate of a parameter.
Linear Regression (Normal)
Frequentist
Bayesian
Parameters treated as fixed constants
Parameters treated as random variables
Does not incorporate prior information
Incorporates prior information or beliefs
Confidence intervals for uncertainty
Credible intervals for uncertainty
Hypothesis testing using p-values
Hypothesis testing by interpreting posterior probabilities
Results express frequencies of estimates from multiple iterations
Results express probabilities about parameter values
Logistic Regression
Bayesian logistic regression, similar to its frequentist counterpart, is used to model a binary categorical response variable (Y) given a set of predictor values (X). The resulting model estimates the log-odds of Y, which we can rewrite in terms of odds and probability.
The Bayesian framework takes into account prior beliefs about the regression parameters, \beta_0 and \beta_1. Since these parameters can be any number on the real line, we are able to use the Normal distribution for both priors.
Bayesian logistic regression involves updating these priors with the likelihood of the observed data so that we can make inferences about the relationship between X and Y.
Unadjusted Models: - The 80% CI for foreshore slope is above 0, suggesting the chance of nesting increases as the foreshore slope increases. - The 80% CI for nest elevation is below 0, suggesting the chance of nesting decreases as the nest elevation increases.
Adjusted Model: - After adjusting for all predictors, we see similar results: - Higher foreshore slope is associated with a higher probability of nesting. - Higher elevation is associated with a lower probability of nesting.
APPENDIX A
CODE
---execute: echo: false warning: false message: false error: falseformat: revealjs: theme: serif embed-resources: true slide-number: true width:1200 height:900 df-print: paged html-math-method: katex self-contained: trueeditor: sourcepdf-separate-fragments: truefig-align: centercss:|/* Apply font size to all tables within the document */ .reveal table { font-size:12px; /* Adjust the font size */ }/* Ensure that table headers and cells also use the new font size */ .reveal table th, .reveal table td { font-size:12px; /* Apply the same font size to headers and data cells */ padding:5px; /* Optional: Adjust padding for readability */ }---#install.packages("gsheet")library(gsheet)library(bayesrules)library(rstanarm)library(bayesplot)library(tidyverse)library(broom.mixed)library(tidybayes)## Bayesian Analysis of Pensacola Beach Loggerhead Sea Turtle Nesting Data Utilizing 10:1 Pseudo-Absence to Presence RatioAuthors: Audrey Moore, MS & Laura Sikes, MPH<br> Instructor: Dr. Samantha Seals <br> Date: April 23, 2025<br>## Bayesian vs. Frequentist Approach**Bayesian**- Begins with existing belief (Prior) about probability distr. of hypothesized outcome.- Introduces new data with a Likelihood distribution for hypothesized outcome.- Bayes Theorem joins Prior and Likelihood resulting in Posterior, an updated probability distr. for hypothesized outcome.- Strength of Prior and sample size determine influence on Posterior.**Frequentist**- Analyzes observed data.- Results in a calculated best estimate of a parameter.## Linear Regression (Normal)| Frequentist | Bayesian ||--------------------------------------|----------------------------------|| Parameters treated as fixed constants | Parameters treated as random variables || Does not incorporate prior information | Incorporates prior information or beliefs || Confidence intervals for uncertainty | Credible intervals for uncertainty || Hypothesis testing using *p*-values | Hypothesis testing by interpreting posterior probabilities || Results express frequencies of estimates from multiple iterations | Results express probabilities about parameter values |## Logistic Regression- Bayesian logistic regression, similar to its frequentist counterpart, is used to model a binary categorical response variable (Y) given a set of predictor values (X). The resulting model estimates the log-odds of Y, which we can rewrite in terms of odds and probability.- The Bayesian framework takes into account prior beliefs about the regression parameters, $\beta_0$ and $\beta_1$. Since these parameters can be any number on the real line, we are able to use the Normal distribution for both priors.- Bayesian logistic regression involves updating these priors with the likelihood of the observed data so that we can make inferences about the relationship between X and Y.## Unadjusted Analysesdata10to1 <-gsheet2tbl("https://docs.google.com/spreadsheets/d/1ARgHYUwclO5weZf5lE1f79V8otZdc4CLSpdCSmwP8Ts/edit?gid=1512758558#gid=1512758558") test_wo_drop <- data10to1 %>%na.omit()data10to1 <-gsheet2tbl("https://docs.google.com/spreadsheets/d/1ARgHYUwclO5weZf5lE1f79V8otZdc4CLSpdCSmwP8Ts/edit?gid=1512758558#gid=1512758558") %>%select(-do_not_use_FS, -do_not_use_BS)test_w_drop <- data10to1 %>%na.omit()# Simulate the prior distributionturtle_model_prior1 <-stan_glm(nested ~ beach_slope,data = test_w_drop, family = binomial,prior_intercept =normal(0, 2.5),prior =normal(0, 2.5),chains =4, iter =5000*2, seed =120189,prior_PD =TRUE)# Update to simulate the posterior distributionturtle_model1 <-update(turtle_model_prior1, prior_PD =FALSE)# Simulate the prior distributionturtle_model_prior2 <-stan_glm(nested ~ dune_ht,data = test_w_drop, family = binomial,prior_intercept =normal(0, 2.5),prior =normal(0, 2.5),chains =4, iter =5000*2, seed =120189,prior_PD =TRUE)# Update to simulate the posterior distributionturtle_model2 <-update(turtle_model_prior2, prior_PD =FALSE)# Simulate the prior distributionturtle_model_prior3 <-stan_glm(nested ~ foreshore_slope,data = test_w_drop, family = binomial,prior_intercept =normal(0, 2.5),prior =normal(0, 2.5),chains =4, iter =5000*2, seed =120189,prior_PD =TRUE)# Update to simulate the posterior distributionturtle_model3 <-update(turtle_model_prior3, prior_PD =FALSE)# Simulate the prior distributionturtle_model_prior4 <-stan_glm(nested ~ nest_dist,data = test_w_drop, family = binomial,prior_intercept =normal(0, 2.5),prior =normal(0, 2.5),chains =4, iter =5000*2, seed =120189,prior_PD =TRUE)# Update to simulate the posterior distributionturtle_model4 <-update(turtle_model_prior4, prior_PD =FALSE)# Simulate the prior distributionturtle_model_prior5 <-stan_glm(nested ~ nest_elev,data = test_w_drop, family = binomial,prior_intercept =normal(0, 2.5),prior =normal(0, 2.5),chains =4, iter =5000*2, seed =120189,prior_PD =TRUE)# Update to simulate the posterior distributionturtle_model5 <-update(turtle_model_prior5, prior_PD =FALSE)# Simulate the prior distributionturtle_model_prior6 <-stan_glm(nested ~ beach_slope + dune_ht + foreshore_slope + nest_dist + nest_elev,data = test_w_drop, family = binomial,prior_intercept =normal(0, 2.5),prior =normal(0, 2.5),chains =4, iter =5000*2, seed =120189,prior_PD =TRUE)# Update to simulate the posterior distributionturtle_model6 <-update(turtle_model_prior6, prior_PD =FALSE)posterior_interval(turtle_model1, prob =0.80)posterior_interval(turtle_model2, prob =0.80)posterior_interval(turtle_model3, prob =0.80)posterior_interval(turtle_model4, prob =0.80)posterior_interval(turtle_model5, prob =0.80)posterior_interval(turtle_model6, prob =0.80)tidy(turtle_model1)tidy(turtle_model2)tidy(turtle_model3)tidy(turtle_model4)tidy(turtle_model5)tidy(turtle_model6)| Predictor | Model |80% CI<br>(Lower) | 80% CI<br>(Upper) ||:------------|:----------------------------------------------------------------------|-------------------|-------------------||<small>B. Slope</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-2.246+0.011\text{BS}$</small>|<small>-0.160</small>|<small>-0.169</small>||<small>D. Height</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-1.364-0.234\text{DH}$</small>|<small>-0.536</small>|<small>0.021</small>||<small>F. Slope</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-2.952+0.112\text{FS}$</small>|<small>0.017</small>|<small>0.205</small>||<small>N. Dist.</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-0.018+0.016\text{ND}$</small>|<small>-0.040</small>|<small>0.002</small>||<small>N. Elev.</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-0.483+0.306\text{NE}$</small>|<small>-0.891</small>|<small>-0.102</small>|## Adjusted Analyses<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-0.963-0.074\text{BS} -0.232\text{DH} +0.178\text{FS} +0.012\text{ND} -0.980\text{NE}$</small>| Predictor | Adjusted 80% CI (Lower) | Adjusted 80%CI (Upper) ||-----------------|------------------------:|------------------------:|| Beach Slope |-0.314|0.144|| Dune Height |-0.555|0.038|| Foreshore Slope |0.073|0.287|| Nest Distance |-0.039|0.057|| Nest Elevation |-1.822|-0.100|## Diagnosticsneff_ratio(turtle_model1) #rhat and neff ratio rhat(turtle_model1)neff_ratio(turtle_model2)rhat(turtle_model2)neff_ratio(turtle_model3)rhat(turtle_model3)neff_ratio(turtle_model4)rhat(turtle_model4)neff_ratio(turtle_model5)rhat(turtle_model5)neff_ratio(turtle_model6)rhat(turtle_model6)mcmc_trace(turtle_model1, size =0.1)mcmc_trace(turtle_model2, size =0.1)mcmc_trace(turtle_model3, size =0.1)mcmc_trace(turtle_model4, size =0.1)mcmc_trace(turtle_model5, size =0.1)mcmc_trace(turtle_model6, size =0.1)mcmc_dens_overlay(turtle_model1)mcmc_dens_overlay(turtle_model2)mcmc_dens_overlay(turtle_model3)mcmc_dens_overlay(turtle_model4)mcmc_dens_overlay(turtle_model5)mcmc_dens_overlay(turtle_model6)## First Alternative Model Considered# Simulate the prior distributionturtle_model_prior7 <-stan_glm(nested ~ beach_slope + dune_ht + foreshore_slope + nest_dist + nest_elev + nest_dist*nest_elev,data = test_w_drop, family = binomial,prior_intercept =normal(0, 2.5),prior =normal(0, 2.5),chains =4, iter =5000*2, seed =120189,prior_PD =TRUE)# Update to simulate the posterior distributionturtle_model7 <-update(turtle_model_prior7, prior_PD =FALSE)tidy(turtle_model7)posterior_interval(turtle_model7, prob =0.80)<small>$\text{logit}(\pi_i) =-1.371-0.053\text{BS} -0.208\text{DH} +0.173\text{FS} +0.042\text{ND} -0.820\text{NE} -0.015\text{ND}\times\text{NE}$</small>| Predictor | Adjusted 80% CI (Lower) | Adjusted 80%CI (Upper) ||-------------------------------|-------------------:|-------------------:|| Beach Slope |-0.298|0.184|| Dune Height |-0.528|0.060|| Foreshore Slope |0.067|0.281|| Nest Distance |-0.036|0.128|| Nest Elevation |-1.741|0.109|| Nest Distance $\times$ Nest Elevation |-0.048|0.013|## Second Alternative Model Considered# Simulate the prior distributionturtle_model_prior8 <-stan_glm(nested ~ foreshore_slope + nest_elev + foreshore_slope:nest_elev,data = test_w_drop, family = binomial,prior_intercept =normal(0, 2.5),prior =normal(0, 2.5),chains =4, iter =5000*2, seed =120189,prior_PD =TRUE)# Update to simulate the posterior distributionturtle_model8 <-update(turtle_model_prior8, prior_PD =FALSE)posterior_interval(turtle_model8, prob =0.80)| Predictor | Model |80% CI<br>(Lower) | 80% CI<br>(Upper) ||:------------|:----------------------------------------------------------------------|------------------:|------------------:||<small>B. Slope</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-2.246+0.011\text{BS}$</small>|<small>-0.160</small>|<small>-0.169</small>||<small>D. Height</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-1.364-0.234\text{DH}$</small>|<small>-0.536</small>|<small>0.021</small>||<small>F. Slope</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-2.952+0.112\text{FS}$</small>|<small>0.017</small>|<small>0.205</small>||<small>N. Dist.</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-0.018+0.016\text{ND}$</small>|<small>-0.040</small>|<small>0.002</small>||<small>N. Elev.</small>|<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) =-0.483+0.306\text{NE}$</small>|<small>-0.891</small>|<small>-0.102</small>|## ConclusionsUnadjusted Models:- The 80% CI for foreshore slope is above 0, suggesting the chance of nesting increases as the foreshore slope increases.- The 80% CI for nest elevation is below 0, suggesting the chance of nesting decreases as the nest elevation increases.Adjusted Model:- After adjusting for all predictors, we see similar results:- Higher foreshore slope is associated with a higher probability of nesting- Higher elevation is associated with a lower probability of nesting## Code</style>